Filter-based feature selection methods such as information gain, Gini index, and gain ratio are commonly used in machine learning. It is often assumed that these methods select the most accurate features, but we show this is not true. In this thesis, we study cases when these feature selection metrics and accuracy show “misorderings”: given a pair of features F1 and F2, where F1 has a higher accuracy than F2, the feature selection value is higher for F2 than F1. We first study the frequency of misorderings in randomly-produced synthetic data. Secondly, we study the potential for misordering as two key parameters of the features in a dataset are varied. Finally, we study misorderings in real data and show that misorderings are also prevalent there. Based on our results, we observe that different metrics exhibit different misordering rates, and imposing redundancy-elimination criteria may have the side effect of reducing misordering.
Chapter One: Introduction
1.1 Background of the Study
Mining activities are a significant contributor to environmental degradat...
Background of the Study
Rural banks in Kontagora Local Government Area play a critical role in providin...
Background of the Study
Branch network expansion has long been a strategic approach for banks aiming to increase market penetration and e...
Background of the Study
Informal education encompasses a wide range of learning experiences that occur outside traditional...
Background of the Study
Sentiment analysis, also known as opinion mining, is a key tool in brand monitoring. By analyzin...
Chapter One: Introduction
1.1 Background of the Study
Rural development remains one of the mo...
Background of the Study
Effective communication and critical debate skills are essential components of a well-rounded educ...
Background of the Study
Interest rate benchmarking involves comparing a bank’s lending rates against industry standar...
Background of the Study
Fiscal responsibility laws (FRLs) are essential tools for promoting financial a...
Today, poor infrastructural facilities are identified as a problem. It is important to know that most of the po...